Skip to content

OCPBUGS-77827: fix(api): add missing has() guards to servingCerts CEL validation rule#8331

Open
rutvik23 wants to merge 2 commits intoopenshift:mainfrom
rutvik23:OCPBUGS-77827
Open

OCPBUGS-77827: fix(api): add missing has() guards to servingCerts CEL validation rule#8331
rutvik23 wants to merge 2 commits intoopenshift:mainfrom
rutvik23:OCPBUGS-77827

Conversation

@rutvik23
Copy link
Copy Markdown
Contributor

@rutvik23 rutvik23 commented Apr 24, 2026

What this PR does / why we need it:

  • Adds missing has() guards to the CEL validation rule that checks whether an APIServer loadBalancer.hostname conflicts with servingCerts.namedCertificates[].names

  • The rule was introduced in OCPBUGS-56725: Lb hostname certs cel #6194 (OCPBUGS-56725) but assumed servingCerts and namedCertificates would always be present. In practice:

    • On create: servingCerts is a non-pointer struct (serialized as servingCerts: {}),
      but namedCertificates (omitempty slice) is absent → no such key: namedCertificates
    • On patch/upgrade: Stored objects from older CRD versions may not have servingCerts at
      all → no such key: servingCerts
    • This blocks any user who sets loadBalancer.hostname on the APIServer service with
      configuration.apiServer (e.g. additionalCORSAllowedOrigins) but without servingCerts.

Design decision

  • CEL uses short-circuit evaluation within && chains. Adding has() guards for servingCerts, namedCertificates,
    and cert.names before accessing them is the CEL pattern to handle optional fields safely. This is consistent with the existing has() guards already in the same rule for loadBalancer, configuration, and apiServer.

Which issue(s) this PR fixes:

Fixes OCPBUGS-77827

Special notes for your reviewer:

  • Three has() guards were added: has(self.configuration.apiServer.servingCerts),
    has(self.configuration.apiServer.servingCerts.namedCertificates), and has(cert.names).
  • Backport to affected release branches (4.20+) likely needed.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes envtest tests.

Summary by CodeRabbit

  • Bug Fixes
    • Strengthened validation for API server certificate settings to handle optional fields safely and avoid invalid load balancer hostname assignments, reducing configuration errors and improving reliability.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 24, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 24, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Apr 24, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@rutvik23: This pull request references Jira Issue OCPBUGS-77827, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (yli2@redhat.com), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What this PR does / why we need it:

  • Adds missing has() guards to the CEL validation rule that checks whether an APIServer loadBalancer.hostname conflicts with servingCerts.namedCertificates[].names

  • The rule was introduced in OCPBUGS-56725: Lb hostname certs cel #6194 (OCPBUGS-56725) but assumed servingCerts and namedCertificates would always be present. In practice:

  • On create: servingCerts is a non-pointer struct (serialized as servingCerts: {}),
    but namedCertificates (omitempty slice) is absent → no such key: namedCertificates

  • On patch/upgrade: Stored objects from older CRD versions may not have servingCerts at
    all → no such key: servingCerts

  • This blocks any user who sets loadBalancer.hostname on the APIServer service with
    configuration.apiServer (e.g. additionalCORSAllowedOrigins) but without servingCerts.

Design decision

  • CEL uses short-circuit evaluation within && chains. Adding has() guards for servingCerts, namedCertificates,
    and cert.names before accessing them is the CEL pattern to handle optional fields safely. This is consistent with the existing has() guards already in the same rule for loadBalancer, configuration, and apiServer.

Which issue(s) this PR fixes:

Fixes OCPBUGS-77827

Special notes for your reviewer:

  • Three has() guards were added: has(self.configuration.apiServer.servingCerts),
    has(self.configuration.apiServer.servingCerts.namedCertificates), and has(cert.names).
  • Backport to affected release branches (4.20+) likely needed.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes envtest tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: c40ce5fb-2937-451c-858f-8a6067382824

📥 Commits

Reviewing files that changed from the base of the PR and between abd2876 and 652a104.

⛔ Files ignored due to path filters (17)
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AAA_ungated.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterUpdateAcceptRisks.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDC.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUpstreamParity.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/GCPPlatform.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HCPEtcdBackup.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ImageStreamImportMode.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/KMSEncryptionProvider.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/OpenStack.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/TLSAdherence.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • cmd/install/assets/crds/hypershift-operator/tests/hostedclusters.hypershift.openshift.io/stable.hostedclusters.services.testsuite.yaml is excluded by !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-Default.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
📒 Files selected for processing (1)
  • api/hypershift/v1beta1/hostedcluster_types.go

📝 Walkthrough

Walkthrough

A kubebuilder XValidation rule in the HostedClusterSpec type has been refined to improve robustness. The CEL expression that validates the APIServer service publishing strategy now includes explicit existence checks (has(...)) for optional intermediate fields (configuration.apiServer.servingCerts and nested certificate structures) before attempting to traverse them. Additionally, the rule now guards against missing cert.names before performing the nested comparison between certificate names and the load balancer hostname.


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 2 inconclusive)

Check name Status Explanation Resolution
Stable And Deterministic Test Names ❌ Error Dynamic test names found in test/envtest/generator.go: It() uses fmt.Sprintf with featureSet variable and Describe() uses generateSuiteName() with CRD data. Test names must be static and descriptive. Replace dynamic test names with static strings. Move dynamic content from titles to test bodies. Use "should install all CRDs for feature set" instead of fmt.Sprintf with variables.
Test Structure And Quality ❓ Inconclusive No result was produced after verification. Marking as INCONCLUSIVE. Re-run the check or adjust instructions to produce a final result.
Ote Binary Stdout Contract ❓ Inconclusive No result was produced after verification. Marking as INCONCLUSIVE. Re-run the check or adjust instructions to produce a final result.
✅ Passed checks (9 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding missing CEL validation rule has() guards to the servingCerts validation logic, which is the core modification to hostedcluster_types.go.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests are added. The PR adds CRD validation test cases in YAML format, not Ginkgo tests. The check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds envtest Ginkgo tests for HostedCluster CRD CEL validation. Tests validate API field validation rules, not cluster infrastructure or node topology. Fully compatible with Single Node OpenShift.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies API type definitions with CEL validation guards only. No deployment manifests, operator code, controllers, or scheduling constraints introduced.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. Changes are limited to API type definitions and a standard Go unit test for JSON serialization compatibility. The check does not apply.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci-robot
Copy link
Copy Markdown

@rutvik23: This pull request references Jira Issue OCPBUGS-77827, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (yli2@redhat.com), skipping review request.

Details

In response to this:

What this PR does / why we need it:

  • Adds missing has() guards to the CEL validation rule that checks whether an APIServer loadBalancer.hostname conflicts with servingCerts.namedCertificates[].names

  • The rule was introduced in OCPBUGS-56725: Lb hostname certs cel #6194 (OCPBUGS-56725) but assumed servingCerts and namedCertificates would always be present. In practice:

  • On create: servingCerts is a non-pointer struct (serialized as servingCerts: {}),
    but namedCertificates (omitempty slice) is absent → no such key: namedCertificates

  • On patch/upgrade: Stored objects from older CRD versions may not have servingCerts at
    all → no such key: servingCerts

  • This blocks any user who sets loadBalancer.hostname on the APIServer service with
    configuration.apiServer (e.g. additionalCORSAllowedOrigins) but without servingCerts.

Design decision

  • CEL uses short-circuit evaluation within && chains. Adding has() guards for servingCerts, namedCertificates,
    and cert.names before accessing them is the CEL pattern to handle optional fields safely. This is consistent with the existing has() guards already in the same rule for loadBalancer, configuration, and apiServer.

Which issue(s) this PR fixes:

Fixes OCPBUGS-77827

Special notes for your reviewer:

  • Three has() guards were added: has(self.configuration.apiServer.servingCerts),
    has(self.configuration.apiServer.servingCerts.namedCertificates), and has(cert.names).
  • Backport to affected release branches (4.20+) likely needed.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes envtest tests.

Summary by CodeRabbit

  • Bug Fixes

  • Enhanced validation checks for API server configuration to safely handle optional fields and prevent validation errors.

  • Tests

  • Minor test formatting adjustment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added area/api Indicates the PR includes changes for the API area/cli Indicates the PR includes changes for CLI area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Apr 24, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
api/hypershift/v1beta1/hostedcluster_types.go (1)

527-527: Guards correctly fix the reported "no such key" failures; consider hoisting the configuration has() checks out of services.exists() for readability and CEL cost.

The added has() guards address the issue correctly: thanks to && short-circuit evaluation, self.configuration.apiServer.servingCerts.namedCertificates and cert.names are only dereferenced after their existence is confirmed. This handles both failure modes: "no such key: servingCerts" on upgrade-from-older-CRD, and "no such key: namedCertificates" on create.

One optional refinement: the four has(self.configuration...) checks do not depend on s and are currently re-evaluated for every services entry inside the exists predicate. Hoisting them out keeps the inner predicate focused on the per-service check, is easier to read, and reduces CEL cost (relevant given the existing cost-budget caveat noted on Line 669).

♻️ Proposed refactor (equivalent behavior, cheaper and more readable)
-// +kubebuilder:validation:XValidation:rule=`!self.services.exists(s, s.service == 'APIServer' && has(s.servicePublishingStrategy.loadBalancer) && s.servicePublishingStrategy.loadBalancer.hostname != "" && has(self.configuration) && has(self.configuration.apiServer) && has(self.configuration.apiServer.servingCerts) && has(self.configuration.apiServer.servingCerts.namedCertificates) && self.configuration.apiServer.servingCerts.namedCertificates.exists(cert, has(cert.names) && cert.names.exists(n, n == s.servicePublishingStrategy.loadBalancer.hostname)))`, message="APIServer loadBalancer hostname cannot be in ClusterConfiguration.apiserver.servingCerts.namedCertificates[]"
+// +kubebuilder:validation:XValidation:rule=`!has(self.configuration) || !has(self.configuration.apiServer) || !has(self.configuration.apiServer.servingCerts) || !has(self.configuration.apiServer.servingCerts.namedCertificates) || !self.services.exists(s, s.service == 'APIServer' && has(s.servicePublishingStrategy.loadBalancer) && s.servicePublishingStrategy.loadBalancer.hostname != "" && self.configuration.apiServer.servingCerts.namedCertificates.exists(cert, has(cert.names) && cert.names.exists(n, n == s.servicePublishingStrategy.loadBalancer.hostname)))`, message="APIServer loadBalancer hostname cannot be in ClusterConfiguration.apiserver.servingCerts.namedCertificates[]"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/hypershift/v1beta1/hostedcluster_types.go` at line 527, Hoist the
configuration existence guards out of the self.services.exists() predicate:
first check has(self.configuration), has(self.configuration.apiServer),
has(self.configuration.apiServer.servingCerts) and
has(self.configuration.apiServer.servingCerts.namedCertificates) once before
calling self.services.exists(), then keep the inner predicate focused on
per-service fields (e.g. s.service == 'APIServer' &&
has(s.servicePublishingStrategy.loadBalancer) &&
s.servicePublishingStrategy.loadBalancer.hostname != "" &&
self.configuration.apiServer.servingCerts.namedCertificates.exists(cert,
has(cert.names) && cert.names.exists(n, n ==
s.servicePublishingStrategy.loadBalancer.hostname))). This preserves the current
short-circuit safety for cert.names while reducing repeated evaluation of the
four has(...) checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@api/hypershift/v1beta1/hostedcluster_types.go`:
- Line 527: Hoist the configuration existence guards out of the
self.services.exists() predicate: first check has(self.configuration),
has(self.configuration.apiServer),
has(self.configuration.apiServer.servingCerts) and
has(self.configuration.apiServer.servingCerts.namedCertificates) once before
calling self.services.exists(), then keep the inner predicate focused on
per-service fields (e.g. s.service == 'APIServer' &&
has(s.servicePublishingStrategy.loadBalancer) &&
s.servicePublishingStrategy.loadBalancer.hostname != "" &&
self.configuration.apiServer.servingCerts.namedCertificates.exists(cert,
has(cert.names) && cert.names.exists(n, n ==
s.servicePublishingStrategy.loadBalancer.hostname))). This preserves the current
short-circuit safety for cert.names while reducing repeated evaluation of the
four has(...) checks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 09370bd4-c0e3-477e-8ee3-6bf13e80a1bf

📥 Commits

Reviewing files that changed from the base of the PR and between 3e0e06f and b3dba2f.

⛔ Files ignored due to path filters (18)
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AAA_ungated.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AutoNodeKarpenter.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterUpdateAcceptRisks.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDC.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUpstreamParity.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/GCPPlatform.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HCPEtcdBackup.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ImageStreamImportMode.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/KMSEncryptionProvider.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/OpenStack.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • cmd/install/assets/crds/hypershift-operator/tests/hostedclusters.hypershift.openshift.io/stable.hostedclusters.services.testsuite.yaml is excluded by !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-Default.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_types.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (2)
  • api/hypershift/v1beta1/hostedcluster_types.go
  • test/e2e/nodepool_test.go

@rutvik23
Copy link
Copy Markdown
Contributor Author

/cc @enxebre /cc @jparrill /cc @bryan-cox

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 24, 2026

@rutvik23: GitHub didn't allow me to request PR reviews from the following users: /cc.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @enxebre /cc @jparrill /cc @bryan-cox

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 37.44%. Comparing base (7ac2953) to head (652a104).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8331   +/-   ##
=======================================
  Coverage   37.44%   37.44%           
=======================================
  Files         751      751           
  Lines       91969    91969           
=======================================
  Hits        34435    34435           
  Misses      54894    54894           
  Partials     2640     2640           
Flag Coverage Δ
cmd-support 32.63% <ø> (ø)
cpo-hostedcontrolplane 36.48% <ø> (ø)
cpo-other 37.73% <ø> (ø)
hypershift-operator 47.93% <ø> (ø)
other 27.77% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@jparrill jparrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!. Dropped a comment

- anything
- kas.duplicated.hostname.com
expectedError: "loadBalancer hostname cannot be in ClusterConfiguration.apiserver.servingCerts.namedCertificates"
- name: When loadBalancer hostname is set and apiServer has no servingCerts it should pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now in the envtest, is only covered the first has(). Do you mind add two more test cases to independently exercise each has() guard:

  • "When loadBalancer hostname is set and apiServer has servingCerts but no namedCertificates it should pass" -- set servingCerts: {} without namedCertificates. This exercises the has(self.configuration.apiServer.servingCerts.namedCertificates) guard.
  • "When loadBalancer hostname is set and namedCertificates entry has no names it should pass" -- add a namedCertificates entry with only servingCertificate and no names field. This exercises the has(cert.names) guard.

@enxebre
Copy link
Copy Markdown
Member

enxebre commented Apr 27, 2026

Thanks! I think this requires several fixes:

On create: servingCerts is a non-pointer struct (serialized as servingCerts: {}),

Can you please report/PR against o/api to make sure ServingCerts APIServerServingCerts json:"servingCerts" uses omitzero.

Also it seems NamedCertificates should be required here:

type APIServerServingCerts struct {
	// namedCertificates references secrets containing the TLS cert info for serving secure traffic to specific hostnames.
	// If no named certificates are provided, or no named certificates match the server name as understood by a client,
	// the defaultServingCertificate will be used.
	// +optional
	// +listType=atomic
	// +kubebuilder:validation:MaxItems=32
	NamedCertificates []APIServerNamedServingCert `json:"namedCertificates,omitempty"`
}

Then in this repo, is there a real use case to allow loadBalancer.hostname with out configuration.apiServer.servingCerts.namedCertificates[0].Names? if there isn't, then this can be required by the cel rule

cc @JoelSpeed

@JoelSpeed
Copy link
Copy Markdown
Contributor

Can you please report/PR against o/api to make sure ServingCerts APIServerServingCerts json:"servingCerts" uses omitzero.

This API is old, changing the serialization now is not an option. Adding omitzero to newer clients will cause conflicts between old and new clients as some would attempt to drop the empty {}, and some would try to add it back. In general we don't change the omitzero/omitempty once an API has reached v1

@rutvik23
Copy link
Copy Markdown
Contributor Author

Then in this repo, is there a real use case to allow loadBalancer.hostname with out configuration.apiServer.servingCerts.namedCertificates[0].Names? if there isn't, then this can be required by the cel rule

It seems the users are allowed to set loadBalancer.hostname for DNS resolution (e.g., with external-dns creating a record for (kas.example.com) while using the default operator-managed serving certificates and the hostname just controls where DNS points. The affected user had loadBalancer.hostname + configuration.apiServer (for CORS/TLS/Audit profile) but no servingCerts and the validation failed when HC was patched.

@enxebre
Copy link
Copy Markdown
Member

enxebre commented Apr 29, 2026

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 29, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre, rutvik23

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 29, 2026
Copy link
Copy Markdown
Contributor

@jparrill jparrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 29, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@rutvik23 rutvik23 marked this pull request as ready for review April 29, 2026 11:30
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 29, 2026
@openshift-ci openshift-ci Bot requested a review from sjenning April 29, 2026 11:30
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 29, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 29, 2026

New changes are detected. LGTM label has been removed.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed | Build: 2049442096997208064 | Cost: $2.3437425 | Failed step: hypershift-azure-run-e2e-self-managed

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aks | Build: 2049442096770715648 | Cost: $2.4153965 | Failed step: hypershift-azure-run-e2e

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2049442096867184640 | Cost: $2.563419999999999 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 29, 2026

@rutvik23: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aks-4-22 f0dda0e link true /test e2e-aks-4-22

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

// +kubebuilder:validation:XValidation:rule=`self.platform.type == "Azure" ? self.services.exists(s, s.service == "Ignition" && s.servicePublishingStrategy.type == "Route") : true`,message="Azure platform requires Ignition to use Route service publishing strategy"
// +kubebuilder:validation:XValidation:rule=`has(self.issuerURL) || !has(self.serviceAccountSigningKey)`,message="If serviceAccountSigningKey is set, issuerURL must be set"
// +kubebuilder:validation:XValidation:rule=`!self.services.exists(s, s.service == 'APIServer' && has(s.servicePublishingStrategy.loadBalancer) && s.servicePublishingStrategy.loadBalancer.hostname != "" && has(self.configuration) && has(self.configuration.apiServer) && self.configuration.apiServer.servingCerts.namedCertificates.exists(cert, cert.names.exists(n, n == s.servicePublishingStrategy.loadBalancer.hostname)))`, message="APIServer loadBalancer hostname cannot be in ClusterConfiguration.apiserver.servingCerts.namedCertificates[]"
// +kubebuilder:validation:XValidation:rule=`!self.services.exists(s, s.service == 'APIServer' && has(s.servicePublishingStrategy.loadBalancer) && s.servicePublishingStrategy.loadBalancer.hostname != "" && has(self.configuration) && has(self.configuration.apiServer) && has(self.configuration.apiServer.servingCerts) && has(self.configuration.apiServer.servingCerts.namedCertificates) && self.configuration.apiServer.servingCerts.namedCertificates.exists(cert, has(cert.names) && cert.names.exists(n, n == s.servicePublishingStrategy.loadBalancer.hostname)))`, message="APIServer loadBalancer hostname cannot be in ClusterConfiguration.apiserver.servingCerts.namedCertificates[]"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using optional types here to remove all the has blocks

Suggested change
// +kubebuilder:validation:XValidation:rule=`!self.services.exists(s, s.service == 'APIServer' && has(s.servicePublishingStrategy.loadBalancer) && s.servicePublishingStrategy.loadBalancer.hostname != "" && has(self.configuration) && has(self.configuration.apiServer) && has(self.configuration.apiServer.servingCerts) && has(self.configuration.apiServer.servingCerts.namedCertificates) && self.configuration.apiServer.servingCerts.namedCertificates.exists(cert, has(cert.names) && cert.names.exists(n, n == s.servicePublishingStrategy.loadBalancer.hostname)))`, message="APIServer loadBalancer hostname cannot be in ClusterConfiguration.apiserver.servingCerts.namedCertificates[]"
// +kubebuilder:validation:XValidation:rule=`!self.services.exists(s, s.service == 'APIServer' && has(s.servicePublishingStrategy.loadBalancer) && s.?servicePublishingStrategy.loadBalancer.hostname.orValue("") != "" && self.?configuration.apiServer.servingCerts.namedCertificates.orValue("[]").exists(cert, cert.?names.orValue("[]").exists(n, n == s.servicePublishingStrategy.loadBalancer.hostname)))`, message="APIServer loadBalancer hostname cannot be in ClusterConfiguration.apiserver.servingCerts.namedCertificates[]"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed I tried using CEL optional types as you suggested; but then make update & CRD generation fails consistently on this cost-estimation error: OpenAPIv3 schema, spec.validation.openAPIV3Schema: Forbidden: x-kubernetes-validations estimated rule cost total for entire OpenAPIv3 schema exceeds budget by factor of more than 100x (try simplifying the rule, or adding maxItems, maxProperties, and maxLength where arrays, maps, and strings are declared)]

  • upon digging more in this context, it turned out that CEL optional types (.?/.orValue()) have significantly higher estimated cost in the Kubernetes CRD cost estimator compared to has() guards. Hence, the has() guards worked out properly in local testing & make update also didn't fail.
  • The HostedCluster CRD is massive with many validation rules, so it seem to be having very little cost headroom. We've had similar issue in this repo earlier -> NO-JIRA: fix(api): replace CEL url() validators with regex to fix CRD cost budget

So I think we're good to go with has() guards for now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have significantly higher estimated cost

Do you have any sources/links to explain this? Since it's a presence check, it should be fixed (low) cost 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root cause is likely in the cel-go cost estimator (cel-go/checker/cost.go); let me know if you agree with this hypothesis:

Comparison

Aspect has() guards .?/.orValue()
AST kind SelectKind (test-only) CallKind
Path tracking Preserved Lost — no path for select_optional_field
List size for exists() Bounded by schema Max: math.MaxUint64
Nested loop cost bounded * bounded MaxUint64 * MaxUint64

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking, this feels like a bug in the estimator to me, lets continue with what you have for now

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 1, 2026
@hypershift-jira-solve-ci
Copy link
Copy Markdown

hypershift-jira-solve-ci Bot commented May 1, 2026

I now have all the evidence needed for a complete analysis. Here is the report:

Test Failure Analysis Complete

Job Information

  • Prow Job: tide (merge gate)
  • Build ID: N/A — no CI job was executed; this is a tide merge-gate error
  • PR: #8331OCPBUGS-77827: fix(api): add missing has() guards to servingCerts CEL validation rule
  • Author: rutvik23
  • Branch: OCPBUGS-77827main

Test Failure Analysis

Error

tide: Not mergeable. PR has a merge conflict.

Summary

No Prow CI test jobs have failed. The error state is exclusively from the tide merge controller, which reports the PR has a git merge conflict with the main branch. PR #8265 ("AUTOSCALE-615: include Karpenter node vCPUs in billing metric") was merged into main on 2026-05-01T03:00:51Z, modifying 17 files that overlap with PR #8331 — primarily api/hypershift/v1beta1/hostedcluster_types.go and all auto-generated CRD manifests under zz_generated.featuregated-crd-manifests/. Additionally, all 6 e2e jobs (e2e-aks, e2e-aws, e2e-aws-upgrade-hypershift-operator, e2e-azure-self-managed, e2e-kubevirt-aws-ovn-reduced, e2e-v2-aws) show "Waiting for pipeline condition to trigger this job" — they are gated behind the lgtm label, which the PR does not yet have.

Root Cause

The tide "error" state is not a test failure — it is a merge-gate status indicating the PR cannot be merged in its current state due to two independent issues:

  1. Git merge conflict (primary blocker): PR AUTOSCALE-615: include Karpenter node vCPUs in billing metric #8265 ("AUTOSCALE-615: include Karpenter node vCPUs in billing metric") merged into main on 2026-05-01, modifying api/hypershift/v1beta1/hostedcluster_types.go and all 14 overlapping zz_generated.featuregated-crd-manifests/ YAML files, plus 3 zz_generated.crd-manifests/ files and the vendored copy. Since PR OCPBUGS-77827: fix(api): add missing has() guards to servingCerts CEL validation rule #8331's last commit was on 2026-04-29 (before AUTOSCALE-615: include Karpenter node vCPUs in billing metric #8265 merged), the branch is now out of date and has textual conflicts in the shared CRD definition files. The needs-rebase label was automatically applied by the Prow rebase plugin.

  2. Missing lgtm label (secondary blocker): The PR has approved but lacks lgtm. Without lgtm, the 6 e2e presubmit jobs are gated and will not trigger — they show "Waiting for pipeline condition to trigger this job." The tide status history confirms this: before the merge conflict arose, tide was already reporting "Needs lgtm, verified labels."

Both issues must be resolved before tide will attempt to merge the PR. The e2e jobs that appear as "pending" are not failing — they have never been triggered.

Recommendations
  1. Rebase the branch onto current main to resolve the merge conflict with PR AUTOSCALE-615: include Karpenter node vCPUs in billing metric #8265. Since both PRs modify hostedcluster_types.go and auto-generated CRD manifests, the rebase will require:

    • Manually resolving conflicts in api/hypershift/v1beta1/hostedcluster_types.go
    • Re-running CRD generation (make generate) to regenerate all zz_generated.featuregated-crd-manifests/ and zz_generated.crd-manifests/ files
    • Updating vendor/ copy: vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_types.go
  2. Obtain lgtm from a reviewer — the PR has approved but still needs lgtm to ungate the e2e presubmit jobs. Without this, even after rebasing, the e2e jobs will remain in "Waiting for pipeline condition" state.

  3. After rebase + lgtm, the 6 e2e jobs will trigger automatically and the needs-rebase label will be removed by the Prow rebase plugin.

Evidence
Evidence Detail
Tide status (current) "Not mergeable. PR has a merge conflict." — set at 2026-05-01T02:58:03Z
Tide status (prior) "Not mergeable. Must be by author dependabot[bot] OR ... Needs lgtm, verified labels." — set at 2026-04-29T12:09:06Z
Conflicting PR #8265 "AUTOSCALE-615: include Karpenter node vCPUs in billing metric" — merged 2026-05-01T03:00:51Z
Files in conflict 17 files overlap: hostedcluster_types.go, 13 zz_generated.featuregated-crd-manifests/*.yaml, 3 zz_generated.crd-manifests/*.yaml, vendored copy
PR #8331 last commit 2026-04-29T12:05:41Z (before #8265 merged)
needs-rebase label Present on PR — applied automatically by Prow
Missing lgtm label PR has approved but no lgtm; e2e jobs are gated behind this label
E2e job status (all 6) "Waiting for pipeline condition to trigger this job" — never started
Successful Prow jobs images, okd-scos-images, security, verify-deps, verify-workflows — all passed
Successful GH Actions All unit tests, envtests, lint, verify, codespell, gitlint — all passed

rutvik23 added 2 commits May 6, 2026 14:16
  The CEL rule validating that APIServer loadBalancer hostname is not in
  servingCerts namedCertificates fails with "no such key" when
  servingCerts or namedCertificates are not set. This adds has() guards
  for servingCerts, namedCertificates, and cert.names fields to prevent
  the error.

  Fixes: OCPBUGS-77827
  Signed-off-by: rutvik23 <rkshirsa@redhat.com>
Add two additional envtest cases to exercise each has() guard
added in the servingCerts CEL validation fix:

- servingCerts present but no namedCertificates: exercises
  has(self.configuration.apiServer.servingCerts.namedCertificates)
- namedCertificates entry with no names field: exercises
  has(cert.names)

Signed-off-by: rutvik23 <rkshirsa@redhat.com>
Commit-Message-Assisted-by: Claude (via Claude Code)
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2026
@rutvik23 rutvik23 requested a review from jparrill May 6, 2026 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api Indicates the PR includes changes for the API area/cli Indicates the PR includes changes for CLI area/testing Indicates the PR includes changes for e2e testing jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants